A Continuous-time Markov Decision Process Based Method on Pursuit-Evasion Problem

نویسنده

  • Jia Shengde
چکیده

This paper presents a method to address the pursuit-evasion problem which incorporates the behaviors of the opponent, in which a continuous-time Markov decision process (CTMDP) model is introduced, where the significant difference from Markov decision process (MDP) is that the influence of the transition time between the states is taken into account. By introducing the concept of situation, the probabilities addressing average behaviors are obtained. Furthermore, these probabilities are introduced to construct the transition matrix in the CTMDP. A policy iteration method for solving the CTMDP is also given. To demonstrate the CTMDP method for pursuit-evasion, examples in a grid environment are computed. The CTMDP-based method presented in this paper offers a new approach to pursuit-evasion modeling and may be extended to similar problems in the sequential decision process.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Dynamic Programming for One-sided Partially Observable Pursuit-evasion Games

We study two player pursuit-evasion games with concurrent moves, infinite horizon, and discounted rewards. The players have partial observability, however, the evader is given an advantage of knowing the current position of the units of the pursuer. We show that (1) value functions of this game depend only on the position of the pursuing units and the belief the pursuer has about the position o...

متن کامل

A Model-Based Approach to Optimizing Ms. Pac-Man Game Strategies in Real Time

This paper presents a model-based approach for computing real-time optimal decision strategies in the pursuitevasion game of Ms. Pac-Man. The game of Ms. Pac-Man is an excellent benchmark problem of pursuit-evasion game with multiple, active adversaries that adapt their pursuit policies based on Ms. Pac-Man’s state and decisions. In addition to evading the adversaries, the agent must pursue mul...

متن کامل

Operation Scheduling of MGs Based on Deep Reinforcement Learning Algorithm

: In this paper, the operation scheduling of Microgrids (MGs), including Distributed Energy Resources (DERs) and Energy Storage Systems (ESSs), is proposed using a Deep Reinforcement Learning (DRL) based approach. Due to the dynamic characteristic of the problem, it firstly is formulated as a Markov Decision Process (MDP). Next, Deep Deterministic Policy Gradient (DDPG) algorithm is presented t...

متن کامل

A Residual Gradient Fuzzy Reinforcement Learning Algorithm for Differential Games

In this work, we propose a new fuzzy reinforcement learning algorithm for differential games that have continuous state and action spaces. The proposed algorithm uses function approximation systems whose parameters are updated differently from the updating mechanisms used in the algorithms proposed in the literature. Unlike the algorithms presented in the literature which use the direct algorit...

متن کامل

The temporal derivative of expected utility: A neural mechanism for dynamic decision-making

Real world tasks involving moving targets, such as driving a vehicle, are performed based on continuous decisions thought to depend upon the temporal derivative of the expected utility (∂V/∂t), where the expected utility (V) is the effective value of a future reward. However, the neural mechanisms that underlie dynamic decision-making are not well understood. This study investigates human neura...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014